Compiler-assisted energy optimization for clustered VLIW processors

نویسندگان

  • Rahul Nagpal
  • Y. N. Srikant
چکیده

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse and limit the usability of clustered architectures in smaller technologies. However, technological advancements now permit design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously save energy in functional units and interconnects to improve the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with the conservative estimates of contribution of functional unit and interconnect to overall processor energy consumption, the proposed combined scheme obtains on an average 8% and 10% improvement in overall energy-delay Email addresses: [email protected] (Rahul Nagpal), [email protected] (Y. N. Srikant) 1Corresponding Author. Address: Dept of CSA, Indian Institute of Science, Banagalore, Karnataka, India 560012 Preprint submitted to Journal of Parallel and Distributed Computing April 12, 2012 product with 3.5% and 2% performance degradation for a 2-clustered and a 4clustered machine respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiler-assisted power optimization for clustered VLIW architectures

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler ...

متن کامل

Thesis - Vasileios Porpodas

Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitio...

متن کامل

Exploring Energy-Performance Trade-Offs for Heterogeneous Interconnect Clustered VLIW Processors

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making design simpler, it introduces extra overheads by way of inter-cluster communication. This communication ...

متن کامل

An Advanced Compiler Designed for a VLIW DSP for Sensors-Based Systems

The VLIW architecture can be exploited to greatly enhance instruction level parallelism, thus it can provide computation power and energy efficiency advantages, which satisfies the requirements of future sensor-based systems. However, as VLIW codes are mainly compiled statically, the performance of a VLIW processor is dominated by the behavior of its compiler. In this paper, we present an advan...

متن کامل

Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures

Clustering has become a common trend in very long instruction words (VLIW) architecture to solve the problem of area, energy consumption, and design complexity. Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption penalty caused by explici...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 72  شماره 

صفحات  -

تاریخ انتشار 2012